Model Selection

Real-time Processing

# Real-time Processing

Ultravox V0 5 Llama 3 2 1b GGUF

Ultravox v0.5 is an audio-to-text model optimized from the Llama-3 2.1B architecture, focusing on efficient speech transcription tasks.

Speech Recognition

Mediapipe Selfie Segmentation Landscape

A lightweight portrait segmentation model in ONNX format, specifically optimized for separating people from backgrounds in landscape images.

Image Segmentation

Vitpose Base Simple

A lightweight pose estimation model based on ViT architecture for human keypoint detection

Pose Estimation

Coreml Sam2 Tiny

SAM 2 Tiny is the Core ML version of the general-purpose segmentation model for images and videos released by FAIR, optimized for mobile applications

Image Segmentation

Genrevim Music Detection DistilHuBERT

This model is a fine-tuned audio classification model based on DistilHuBERT, specifically designed to distinguish between music and non-music audio.

Audio Classification

Yolov8n Handwritten Text Detection

An object detection model based on YOLOv8, specifically designed for detecting handwritten text content

Object Detection Other

Trocr Base Plate Number

An example vision model for recognizing vehicle license plates, capable of extracting license plate numbers from images.

Text Recognition

Tiny Random Vits

Open-source model based on Apache-2.0 license, specific functionalities depend on the actual model

Large Language Model

Ssast Audioset Librispeech 16 16

This model is used for audio classification tasks and can classify and recognize audio data.

Audio Classification

Ast Finetuned Speech Commands V2

A voice command recognition model based on AST architecture, optimized for web deployment in ONNX format

Audio Classification

Pyannote Speaker Diarization Endpoint

Speaker diarization model based on pyannote.audio 2.0, used for automatically detecting and segmenting different speakers in audio

Speaker Analysis

Whitebox Cartoonizer

A TensorFlow SavedModel-based white-box cartoonizer model capable of converting real images into cartoon-style images.

Image Generation

Whisper Small ISSAI KSC 335RS V2

A small speech recognition model based on the Whisper architecture, suitable for domain-specific speech-to-text tasks

Speech Recognition

Mscoco Finetuned CoCa ViT L 14 Laion2b S13b B90k

This is an image-to-text model based on the MIT license, capable of converting image content into textual descriptions.

This is an open-source model based on the Apache-2.0 license, with specific functionalities to be determined by the actual model type

Large Language Model

Unixcoder Base Unimodal

This is an open-source model using the Apache-2.0 license, with specific functionalities and application areas requiring further confirmation

Large Language Model

Distil Wav2vec2 Adult Child Cls 37m

An audio classification model based on the wav2vec 2.0 architecture, designed to distinguish between adult and child voices

Audio Classification

Transformers English

Wav2vec2 Xls R Tf Left Right Trainer

A speech recognition model fine-tuned based on facebook/wav2vec2-xls-r-300m, supporting left-right channel processing

Speech Recognition

Distilhubert Ft Keyword Spotting

Keyword recognition model based on the DistilHuBERT architecture, fine-tuned on the superb dataset with an accuracy of 97.06%

Audio Classification

Xlm Roberta Base Finetuned Somali

Large Language Model

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase